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In an attempt to increase the efficiency of mastery testing while main- 
taining a high level of confidence for each mastery decision, the theory and 
technology of item characteristic curve (ICC) response theory (Lord & Novick, 
1968) and adaptive testing were applied to the problem of judging individuals' 
competencies against a prespecified mastery level to determine whether each 
individual is a "master" or a "nonm^ter" of a specified content domain. Itemsl 
from two conventionally administered classroom mastery tests administered in a 
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military training environment were calibrated using the unidimensional three- 
parameter logistic ICC model. Then, using response data originally obtained 
from the conventional administration of the tests, a computerized adaptive 
mastery testing (AMT) strategy was applied in a real-data simulation. 

The ArtT procedure used ICC thecry to transform the arbitrary "proportion 
correct" mastery level used in traditional mastery testing to the ICC achieve- 
ment iretric in order to allow the adaptation of the test to each trainee's 
achievement level estimate, which was calculated after each item response. 
Adaptive testing continued until the 95% Bayesian confidence interval around 
the trainee's achievement level estimate failed to contain the prespecified 
mastery level. At that point testing was terminated, and a mastery decision 
was made for the trainee. 

Results obtained from the AMT procedure were compared to results obtained 
from the traditional mastery testing paradigm in terms of the reduction in 
mean test length, information characteristics, and the correspondence between 
decisions made by the two procedures for three different mastery levels and 
for each of the two tests. The AMT procedure reduced the average test length 
30% to 81% over all circumstances examined (with modal test length reductions 
of up to 92%) , while reaching the same decision as the conventional procedure 
for 96% of the trainees. 

Additional advantages and possible applications of AMT procedures in 
certain classroom situations are noted and discussed, and further research 
questions are suggested. 
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An Adaptive Testing Strategy for 
Mastery Decisions 



During the past 15 years, considerable interest in the psychological and 
educational measurement community has been directed toward the evaluation of 
student competency in various fields of study. In the simplest case, compe- 
tency in a field has been operationalized as some minimum skill level above 
which a student is declared a "master" and below which a student is declared 
a "nonmaster." Mastery testing has been developed as an implementation of the 
more general criterion-referenced test interpretation model formulated by 
Glaser and Klaus (1962) and expanded upon by many since then (e.g., Hambleton, 
Swaminathan, Algina, & Coulson, 1978; Popham, 1971; Popham & Husek, 1969). 

"Mastery" has typically been defined by subject matter experts as the min- 
imum percentage of items that a student should be able to answer from a given 
set of test items in order to be classified as proficient. Therefore, a stu- 
dent who correctly answered only the minimum acceptable percentage of items on 
a test of this type would be declared a master, and a student who correctly 
answered one item less would be decided a nonmaster in the subject matter area. 
So that all of the mastery decisions made would be comparable, mastery testing 
has traditionally required all students to answer the same set of test questions. 

This approach to mastery testing has several problems. First, a student 
whose test score is far above the specified cutoff score would be said to be 
a master of the subject matter; similarly, a student whose score was just bare- 
ly above the cutoff score would also be declared a master, but presumably that 
decision would be made with less confidence. Thus, classical mastery testing 
results in different levels of intuitive confidence for students whose raw 
scores fall at different distances above or below the cutoff, which results in 
decisions with different dependabilities for students with different raw scores. 

This problem has been discussed on the group level by Livingston (1972) in 
a study discussing the reliability of criterion-referenced tests as a function 
of the mean score level of the testee group. Hambleton and Novick (1973) and 
Davis and Diamond (1974) have specified methods to develop cutoff rules designed 
to yield certain desired ratios of false positive and false negative decisions 
through the use of the differential accuracy of decisions made at different raw 
score levels, but little research has been directed toward equalizing the con- 
fidence levels in decisions made by a mastery test across all levels of per- 
formance. Hambleton and Novick (1973) have suggested that the use of Bayesian 
point estimation of students' mastery scores might improve the accuracy of mas- 
tery decisions; it will be shown in this report that the use of Bayesian con- 
fidence interval estimates may be useful in equalizing the confidence in de- 
cisions made across all levels of observed performance. 

A second problem with the classical mastery testing paradigm is that each 
student tested is given the same set of test questions, even though the set of 
questions may be inappropriate for any reasonably precise measurement at some 
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achievoment levelh. In the mastery testing area, attempts have been made to 
adapt the test to each student Ve.g., Ferguson, 1970); but these attempts have 
almost universally assumed that all items administered were of equal quality. 
It is possible, through the use of item characteristic curve (ICC) response 
theory (Lord & Novick, 1968), to distinguish between items which yield differ- 
ent amounts of information concerning different trait levels. 

Several authors (e.g., Bejar, Weiss, i. tiialluca, 1977 ; McBride & Weiss, 
1976; Urry, 1977) have demonstrated that adaptive testing procedures using ICC 
response theory ca^ reduce test length with no reduction in measurement pre- 
cision. These testing procedures adapt the difficulty and information charac- 
teristics of each individual's test by drawing from large item pools items 
that are matched to the individuals estimated trait level. These results in- 
dicate that by making use of all of the information available about the test 
items and the individual's estimated achievement levels, the application of 
adaptive testing procedures using ICC response theory to a traditional mastery 
testing situation might result in a decrease in the test length needed to make 
confident decisions concerning each individual's mastery status. 



This report describes the design and application of an adaptive mastery 
testing strategy that eliminates these problems of the traditional mastery test- 
ing approach. The adaptive mastery testing strategy is designed to reduce the 
average test length for each student, wtiile equalizing the level of confidence 
in decisions made across the entire range of the achievement continuum. This 
report compares the performance of the conventional and adaptive mastery testing 
procedures within the context of one course of instruction in terms of effi- 
ciency, information characteristics, and level of correspondence between mas- 
ery decisions. 



The adaptive mastery testing (AMT) procedure is designed to administer 
achievement test items selected from a classical mastery test, but not all items 
are administered to each student. The test items administered to a given stu- 
dent are selected to provide the most information concerning the achievement 
level of that student. Mastery decisions are made with a specified degree of 
confidence for each student, using a cutoff point prespecified on the achievement 
continuum. 1 

There are three important components of the AMT procedure. The first in- 
volves converting the mastery level to the achievement metric. The second com- 
ponent is the item-selection technique used to determine which items should be 
administered to a specific student. The final component of the AMT strategy 
involves the manner in which the mastery decision is made and the degree of con- 
fidence that can be placed in the decision once it has been made. 

The classical mastery testing procedure specifies a percentage of the items 
on a test that must be correctly answered bv a student in order to be declared 
a master. I'sing ICC theorv, it is possible to generate an analogue to the "per- 
centage" cutoff of classical theory for use in adaptive testing. This is nec- 
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essary, since in an adaptive test each individual will tend to answer about 50% 
of the items correctly, given a Jarge enough item pool, because the items ad- 
ministered will be selected to be close to the individual's achievement level 
(Vale & Weiss, 1975; Weiss, 1973). The ICC analogue of proportion correct is 
based on the use of the test characteristic curve (TCC) , The TCC is the func- 
tion that relates the ICC achievement continuum to the expected proportion of 
correct answers that an individual at any achievement level may be expected to 
obtain if all of the items on tne test were administered. 

For this study the assumption was made that a three-parameter logistic 
ogive would describe the functional relationship between the latent trait 
(achievement) and the probability of observing a correct response to anv of the 
items on the test. This assumption yields a TCC of the following form: 



F(f|9) = V 



■+ (1 - ..) / exp[ 1.7 7,(^.-6) ] 
\expfl.7u.(i ... - G)]+i 




[1] 



where 



-'.(•' K) - 



0 



e 



the expected value of the proportion of correct answers observed 
on the test, given an achievement level; 

the estimate of the ICC discrimination parameter for item i; 

the estimate of the ICC difficulty parameter for item i; 

the estimate of the lower asymptote of the ICC for item i\ 

the number of items on the test; and 
a given achievement level. 



Thus, as Equation 1 indicates, the expected proportion correct at a given level 
of achieveme. * (?) is the average, over all items in the test, of the probabili- 
ty of a correc- response for each item, given the three ICC item parameters for 
each item and assuming a logistic ICC. 



This monotonically increasing function permits relating any achievement 
level to its most likely proportion correct or, more importantly in this con- 
text, determining the achievement level (e) which will most probably result in 
any given proportion of correct answers. An example of the use of the TCC in 
determining an achievement level that is comparable to a desired "percentage" 
cutoff is shown in Figure 1 using a hypothetical TCC. To determine a level of 
achievement that corresponds to, for example, a 70% mastery level on the test 
items whicl comprise the TCC, these steps would be followed: 

1. Draw a horizontal line (line A in Figure 1) from the '-.7 mark on 
the vortical (expected proportion correct, or •') axis of the TCC plot 
to the TCC. 

2. Drop a vertical line (line B) from the point of intersection of the 
TCC and the horizontal line drawn in Step 1 to the horizontal (achieve- 
ment level, or 0) axis. This point (e ) on the achievement level axis 
is designated the mastery level using the achievement metric. 
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Figure I 

Hypothetical Test Characteristic Curve Illustrating 
Conversion from a Proportion Correct Mastery Level 
to the Achievement Metric 
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3. The cutoff point specified in Step 2 may now be used to make mastery 
decisions in place of the P=.7 mastery level originally specified. 
Once the mastery level is expressed in the achievement metric (8), 
rather than in terms of proportion correct, it is no longer necess- 
ary to administer all the items in the test to obtain an achievement 
level estimate for an individual — and a corresponding mastery de- 
cision. An achievement level estimate can then be obtained using any 
subset of items from the original test, provided that the individual's 
item responses are scored with a method that will put the achievement 
level estimate on the same metric as the TCC. Any ICC-based scoring 
procedure (Bejar & Weiss, 1979), in conjunction with the original item 
parameter estimates, will result in an achievement level estimate 
which will be on the 0 metric. 

This procedure allows conversion of any desired proportion correct mastery 
level to the 9 matric. Once this transfer is made, ICC theory and adaptive 
testing strategies may be used to increase the efficiency of mastery testing 
techniques. 

Adaptive Item Splrntion and Snoring 

To make mastery testing a mor- efficient process, the objectives of the 
AMT strategy were (1) to reduce the length of each student's test by elimi- 
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nating test items which provided little information concerning the student's 
achievement level and (2) to terminate the AMT procedure after enough infor- 
mation had been obtained so that the mastery decision could be made with a 
high degree of confidence. 

To operationalize the first objective, items were selected to be adminis- 
tered to student at each point during the testing procedure on the basis of the 
amount of information that the item provided concerning the student's achieve- 
ment level estimate at that point in testing. The administration of the test 
item which provides the most information concerning the student's present achieve- 
ment level estimate should provide the most efficient use of testing time. A 
procedure that selects and administers the most informative item at each point 
in an adaptive testing procedure was described by Brown and Weiss (1977), and 
this procedure was used in the present study. This procedure uses an adaptive 
maximum information search and selection (MISS) technique for the sequential 
selection of test items to be administered to each individual. 
t 

Item selection. The information that an item provides at each point along 
the achievement continuum can be determined from the ICC parameters of the item. 
Using the unidimensional three-parameter logistic ICC model (Birnbaum, 1968) to 
describe responses to the five-alternative multiple-choice items used in this 
study, the information available in any item is (Birnbaum, 1968, Equation 20.4.16) 



/.(6) - (l-^)D 2 a|* 2 [DL.(B)] f {^{DL^Q)] + o^ 2 [-DL. (9) ] } [2] 



where 



1.(8) - the information available from item i at any achievement level 0; 
on = the ICC discrimination parameter of the item; 
c i - the lower asymptote of the ICC for the item; 

D = 1.7, a scaling factor used to allow the logistic ICC to closely 
approximate a normal ogive; 
L.(Q) = a £ (8 - fc^), where b. is the ICC difficulty parameter of the item; 



the logistic probability density function; and 



¥ ■ the cumulative logistic function. 



provid.d an explanation and scoring programs for Oven's „ e thod. 
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Owen's 6 estimation procedure has been shown to yield biased estimates of 
trait levels (Kingsbury & Weiss, 1979; Lord, 1976; MrBride & Weiss, 1976). 
This bias may be attributed to the assumption of a normal distribution of 8 in 
the population made by Owen's procedure (Lord, 1976) and/or to inappropriate 
prior information concerning 6 on the individual level (Kingsbury & Weiss, 1979). 
The bias inherent in this scoring method may render the MISS technique less 
efficient than it would be under optimal conditions, and thereby may reduce the 
efficiency of the AMT technique as a whole. 

To use MISS under optimal conditions, G estimates should be obtained through 
, the use of a maximum likelihood estimation technique, vhich yields asymptotical- 
ly efficient estimates (Birnbaum, 1968). Maximum likelihood 6 estimation tech- 
niques are not able, however, to obtain trait level estimates for consistent 
item response patterns (either all correct or all incorrect responses) or for 
item response patterns for which the likelihood function is extremely fKt. 
Owen's Bayesian scoring method will yield an estimate for any response pattern. 
The inability of the maximum likelihood procedures to estimate 0 for some re- 
sponse patterns mitigates against the use of a maximum likelihood estimation 
procedure in this situation, since it would be necessary to assign arbitrary 9 
estimates during the early stages of item selection and scoring. Thus, the 
Bayesian scoring procedure was used in order to obtain 0 estimates for each 
student after each item administered by the adaptive testing procedure, even 
though some efficiency might have been lost in the AMT due to the bias inherent 
in the estimation procedure. Use of the Bayesian 9 estimation procedure in 
this study also allowed the use of easily interpretable Bayesian confidence in- 
tervals to make the mastery decision. 

Bayesian Confidence Interna Is: Making the Mastery Decision 

Any achievement level estimate (0) obtained using ICC-baged scoring of any 
subset of the items 'from the original test and their ICC item parameters wi'.l 
be on the same metric as the TCC for the original test. Thi^ allows immediate 
comparison between any achievement lev:l estimate (6) and any point on the 
achievement metric (e.g., 0 ). However, two different subsets of items may re- 
sult in achievement level estimates that are not equally informative. For ex- 
ample, if one test consisted of many items that were too easy for a given indi- 
vidual and the other used the same number of equally discriminating items at 
about thi appropriate difficulty level for that individual, the second test 
would yield a much more accurate achievement level e.stitnate for that individual. 
Achievement level estimates that are on the same metric are comparable if their 
differential precision is taken into account. To do this, confidence interval 
estimates for the 6's should be compared instead of tfte point estimates (0). 
For this reason, the AMT strategy makes mastery decisions with the use of Bayes- 
ian confidence intervals. * 

After each item was selected using MISS and administered to a student, -a- * 
point estimate of the student's achievement level (6) was determined using 
Owen's Bayesian scoring algorithm and the responses obtained from all items 
previously administered. Given this point estimate and the corresponding var- 
iant estimate for the 9, also obtained using Owens' procedure (see Brown & 
Weiss, 1977, Equations 3 and 5, pp. 4-5), a Bayesian confidence interval may 
be defined such Lhat: 

P. - 1 96( o* )** - 6 - 0/ + 1 . 96(3^, with /' - .95, , [3] 
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where 



), * the Bayesian point estimate of achievement level c^lculat-d follow- 
ing item i, 



' 2 

0^ - the Bayesian posterior variance estimate following item i , 
9 * the true achievement level. 



and 



This statement may be interpreted as meaning that the probability that the true 
value of the achievement level parameter, 9, is within the bounds of the confi- 
dence interval is .95. Alternatively, it might also be concluded with 95% con- 
fidence that the true parameter value (9) lies within the confidence interval. 
Confidence intervals at differing confidence levels can be constructed using 
appropriate 2-values from a normal distribution in place of the 1.96 in Equa- 
tion 3. 

After this confidence interval has been generated, it can be determined 
whether or not 8^, the achievement level earlier designated as the mastery lev- 
el using the TCC (see Figure 1), falls outside the limits' of the confidence 
interval. If it does not, another item is administered; to the student, and the 
confidence interval is recalculated using the updated 9 and its, updated vari- 
ance. This procedure continues until, after some item has been administered, 
the coojy.dence interval calculated does not include 0 the mastery level on 

the achievement continuum. At this point testing is terminated, and a mastery 
decision is made. If the lower limit of the confidence interval falls above 
the specif^d mastery level, 9^, the student is declared a master. If, on the 

other hand, the upper limit of the confidence interval falls below 9 , the stu- 
ff? 

dent is declared a nonmaster. Given a finite size iter, pool, the testing pro- 
cedure may, in some cases, exhaust the item pool before a decision can be made. 
This will occur for students with 9 values close to 9^. It is possible to make 

a mastery decision for these students based, simply on whether the Bayesian point 

estimate of their Mevement level (9) is above or below 8 . However, for 

these students, mastery decisions will not be made with the same confidence lev- 
els as those made for studeni.s for whom the confidence interval falls completely 
above or below 9„, 
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Figure 2 shows the result of the AMT procedure for two hypothetical test- 
A and B. Achievement level point estimates (6) and error bands, which 
indicate the appropriate Bayesian confidence intervals, are shown fb- each 
testee after each item was administered. An aroit^ary mastery levei, > ^ .50, 

m 

was chosen fo. this example; normally, however, the mastery level would be de- 
termined by the TCC transformation of an existent proportion correct mastery ' 
criterion. 

For Testee A, the first 0 estimate was below 8 mf but the confidence inter- 
val around this estimate contained 6^. Thus, the 6 estimate was not precise 
enough to make a confident decision; consequently, testing continued for Test- 
ee A. After each item was administered, a new 0 estimate and a corresponding 
confidence interval were calculated. For the first 6 items administered to 

1 n 
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Testee A, the confidence interval around the 6 estimate contained 9^, and test- 
ing continued. After the administration of the 7 th item, the entire confidence 
interval around the 6 estimate for Testee A was above G m . This implied that 
the 9 estimate was precise enough to allow a confident decision to be made for 
Testee A. Testee A was declared a master at this point, and testing was termi- 
nated. 



Figure 2 

example of the AMT Procedure: Achievement Level Point Estimates 
Bayesian Confidence Intervals aftar Each Item Administered to 
Two Hypothetical Testeei, Testee A and B 
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Number of Items Administered 

For Testee B, the same type of procedure was followed. For the first 13 
items 'administered to Testee B. the confidence interval around 9 contained 0 m . 
The lbth item administered to Testee B resulted in a 0 and confidence interval 
which fell completely below 9^. At that point, testing was terminated and Tes- 
cee B was declared a nonmaster of the' subject area. 

It should be noticed t Testee A had a final estimate (9-1.9) that was 
-each closer to the mast^ i than the final 6 estimate for Testee B 

(9~ -.30). Therefore, pvl.. ^re precise measurement was needed for Testee B 
than for Testee A to make mastery decisions with comparable confidence levels, 
and several morfe items were administered to Testee B than to Testee A, to ob- 
tain the additional precision needed in order to make the mastery decision. 



The AMT strategy was evaluated using real-data simulation (Weiss/ 1973). 
In this approach, test item response data obtained from the administration of 
a conventional paper-and-pencil multiple-choice achievement test were used to 
simulate the administration of the AMT strategy. That is, items were selected 
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by the AMT strategy for' each student from the conventional test already adminis- 
tered. Item responses obtained in the conventional test were used by the AMT 
strategy and scored as described above. If a mastery decision could not be 
made after a given item was used, another item from the conventional test was 
selected by the MISS approach, and the previously obtained item response was ' 
used by the AMT strategy. This procedure was continued until the AMT strategy 
could make a mastery decision or until all items in the conventional test pool 
had been administered. 

Subjects and Tests 

Item response data were obtained from trainees undergoing the Weapon Me- 
chanics course at the Lowry Air Force Base Technical Training Center during 
1977 and 1978. This course is computer-managed, and trainees proceed at their 
own pace through 13 well-specified blocks of instruction. During each block, 
several tests are given from which mastery decisiona^are made. Trainees are 
given several attempts to pass each test in each block. 

For this study two block tests of different lengths were "arbitrarily chosen 
to investigate the properties of the AMT procedure. Specifically, data used 
were the item responses of 200 trainees to the first test in the first block of 
instruction (Test 11) and the item responses of 200 trainees to the first test 
in the third block of instruction (Test 31). These tests consisted of 30 and 50 
conventionally administered 5-alternative multiple-choice items,- respectively. 
Only the trainees' performances in their first attempt to pass the tests were 
used for this study. 

Fitting the ICC Response Model 

Estimation of item parameters. The procedure used for the estimation of 
the three item parameters of the logistic ICC response model was developed by 
Urry (x976). This procedure obtains initial estimates for the discrimination 
(a) and the difficulty (b) parameters for an item through the use of a direct 
conversion of the classical item parameters and the individuals' raw scores 
(number correct). A value of the lower asymptote parameter (<?) is found which 
minimizes a X goodness-of-f it statistic for the item. These initial values 
are made more precise through the use of an ancillary correction procedure 
(Fisher, 1950). To obtain more precise estimates of the parameters, the entire 
procedure is repeated replacing the individuals' raw scores with Bayesian modal 
estimates (Samejima, 1969) of their achievement levels. 

Urry's item parameterization method excludes items which meet any 0 f the 
following rejection criteria during the first stage of the procedure: 

1. a less than .80, 

2. b less than -4.00 or greater than 4.00, and 

3. greater than .30. 

If an item is excluded on the basis of one of .hese criteria during the initial 
stage of the parameterization procedure, it leceives no parameter estimates in 
either stage of the procedure. These restr ;ive criteria are removed after 
the first phase of tho calibration, and no further culling of the items is done 
Thus, the final values 'of the parameter estimates for those items which survive 
the first phase are not' constrained b? the rejection criteria. 
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ft' M xluaz i nj *k* / ; f f'ie nodtl ■ To examine the usefulness and appropriate- 
ness of the unidimensional three-parameter logistic ICC model with data of the 
type provided by the Weapon Mechanics course, rwo questions were investigated: 

1. Does factor analysis of the intercc t re lat ions between item responses 
result in only a single common n.ctor? That is, is the use ,of a uni- 
dimensional mouel justifi by the presence of only a single nonraudom 
d imension? 

2. Do parameter estimates obtained from these data correspond to the 
range of parameter estimates obtained in previous studies that have 
shown this type of model to be useful in increasing testing efficiency? 

To answet the first question, principal axis factor analyses were performed 
separately on the data fron Test 11 and Test 31- Matrices of item tntercorre- 
lations (phi coefficients) were calculated from the raw item-response data for 
the 200 trainees on each of the tests using the PEARSON CORR computer subroutine 
from the S:>itis ti sjlI m F^kag*. f hx. i>c :til ^'irrutos (^F^'S; Nie, Hull, Jenkins, 

Steinbrenner , & Bent, 1970). 

The resultant 30 a 30 (Test 11) and 50 x 50 (Test 31) item intercorrela t ion 
matrices were each factor analyzed by the iterative principal axis factor analy- 
sis subroutine from ±<tS.\ The initial communal ity estimate for each of the items 
was the squared multiple correlation of the item with all other items in the 
tePt. Ihe analysis iterated until successive communality estimates differed by 
a negligible amount. 

To determine the amount of random variation in the final factor-analytic 
solutions, parallel analyses were conducted following the suggestion of Horn 
(1965). This entailed factor analyses of sets of random data that were gener- 
ated to parallel the origi' -1 data, using the same number of "items" and "sub- 
jects. M Eigenvalues obta d for factors in the random data were used to de- 
termine whether factors ol;t ined from the analysis of the real data were "true" 
factors or residual fetors. If the eigenvalue of a factor obtained from the 
real data was larger than that for the corresponding random-data factor, the 
real-data factor was considered to be a true factor; but if the eigenvalue was 
similar to that obtained from the random-data factor, then the real-data factor 
was considered to be a residual factor of no real importance. 

To answer the second question posed above, the parameter estimates ob- 
tained for these two tests were compared to the estimates obtained in two other 
studies (Bejar, Weiss, ^ Kingsbury, 1977; Brown & Weiss, 1977) that used a uni- 
dimensional three-parameter logistic ICC model to attempt to improve testing 
accuracy in achievement testing situations. Further comparisons were made be- 
tween the parameter estimates obtained from the present data and the guidelines 
expressed by Urry (1977) to indicate whether the use of an adaptive testing item 
pool will improve the quality or efficiency of trait measurement. Urry's guide- 
lines are as follows: 

1. The ; parameter estimates of the items in the pool should exceed .80. 

2. The . ; parameter esti nates should be widelv and evenlv distributed be- 
tween -2.00 a» 1 +2.0'). 

3. The ' parameter est .mate? should be l^ss than .30. 

To the extent that pirameter estimates obtained from Tests 11 and 31 followed 
Urrv's guidelines ani showed cl^se correspondence to other item pools that have 
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proven to be useful in adaptive testing, it could be concluded that the items 
used, in this study would show some usefulness with the unidimensional three- 
parameter ICC model. 

circulation of AW 

^ In order to simulate the AMT strategy, a computer program was designed to 
administer" the one item in v the item pocl (which included all of the items 
from the conventional test not rejected by the calibration procedure) providing 
the most information at a trainee's current level of 9. Each trainee began the 
;est wUh 8 of 0.0 and a prior variance of 1.0. The trainee's response taken 
from his/her original responses to the conventional test was used by the Bayes- 
ian scoring routine to produce a new 8 estimate. Then the item with the most 
information at this new 6 was chosen to be administered next. (No item was ad- 
ministered more than once to a trainee.) A new 8 estimate was found using the 
trainee's response to this item, and then another item was chosen based on the 
new 0 estimate. 

The program continued to choose items to be administered until the trainee's 
8 was shown to be either above or below a given mastery level, 8 , with a pre- 

m r 

specified degree of confidence. A 95% Bayesian symmetric confidence interval 
was calculated around the trainee's 0 after each item was administered.^ The 
AMT strategy continued urtil this confidence interval failed to include the pre- 
specified mastery level; when this occurred, the AMT procedure was terminated. 
A lower limit of three items was set for the length of the AMT to avoid anomalous 
results that might occur from making mastery decisions based on a small number 
of item responses. For trainees for whom a mastery decision could not be made 
with the AMT procedure before all items were administered, mastery was deter- 
mined by whether the final 8 was above or below 8 . 

During the simulation, three different mastery levels were used correspond- 
ing to proportion correct mastery levels of P=.7, .8, and .9. These mastery 
levels were calculated from the TCC for each test, as described above. To max- 
imize the comparability between the conventional and adaptive mastery testing 
strategies, the conventional test was truncated to include only the items which 
were not rejected by the calibration procedure. In addition, the conventional 
test was scored by Owen's Bayesian scoring method, and the same mastery levels 
were used for both testing strategies. 

Com parison of Effi^'moy: AMT versus Convent torn I Testing 

It the AMT strategy were a more efficient testing procedure than the con- 
ventional mastery testing procedure, it would reduce test length while adminis- 
tering items with high enough information to maintain a very high correlation 
between decisions made by the AMT and the conventional approach. Consequently, 
to determine whether the AMT procedure reduced the number of items given to 
trainees without reducing the quality of the mastery decisions made for those 
trainees, three criteria were evaluated separately for Test 11 and Test 31 for 
the AMT and conventional testing procedures at each of the mas* 2ry levels: 

' 1. The mean number of items administered to trainees, 

2. The mean information obtained after all itemr, weie administered, and 

3. Relationships between mastery decisions made at the termination of 
the testing by the AMT and conventional procedures. 
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Figure 3 

Eigenvalues of the First 10 Common Factors Extracted From Item 
Intercorrelations for Test 11 and Test 31 and for Parallel Random-Data Factors 



(a) Test 11 



7.0- 

6.0- 

5.0- 

5 4.0- 
c 

0) 

£ 3.0- 
2.0- 

i 

1.0- 



• Test 11 

• Random Daca 




1 

5 

Factor 



—I — 

10 



(b) Test 31 



0) 
0 

> 

c 

v 

•H 
[Jj 



10.0- 
9.0- 
8.0- 
7.0 - 
6.0 - 
5.0- 
4.0- 
3.0- 
2.0- 
1.0- 



- Test 31 
Random Data 




10 



Factor 



-13- 



In addition, to examine the characteristics of the two testing procedures more 
closely, the mean information obtained from each procedure was plotted for each 
testing strategy as a function of the achievement level estimate for each mas- 
tery level. 

R>.>sd*.: 

Audi ca b ility of the ICr Model 

Fa ctor analysis . Eigenvalues of the first 10 factors extracted from 
item intercorrelations for Test 11 and Test 31 and the random data parallel 
analysis for each test are shown in Appendix Table B-l; these values are plotted 
in Figure 3. Tor Test 11 (Figure 3a) the first three factors had higher eigen- 
values than their corresponding random-data factors. However, only the first 
factor differed substantially from the corresponding random-data factor. Thus, 
for Test 11 it was not unreasonable to infer that only the first factor was a 

true" factor underlying trainees' responses, since the eigenvalues of the 
other factors resembled those of the random factors and the first factor account- 
ed for more than three times the amount of common variance th<an any other factor. 

For Test 31 (Figure 3b) the eigenvalues of the first five factors extracted 
each exceeded the eigenvalues of their corresponding random factor, but only 
the first two factors exceeded the random-data values by a substantial amount. 
The first factor accounted for 20-5% of the common variance extracted uy the 
10-factor solution, and the second factor accounted for 6.2% of the common 
variance. No other factor accounted for more than 5% of the variance. These 
data indicate that there were probably two real factors underlying trainees' 
responses to Test 31. This two-factor solution might indicate that a multi- 
dimensional latent trait model should be postulated to explain trainees' re- 
sponses to Test 31. However, because the first factor accounted for over three 
times as much variance as the second factor, the unidimensional model could 
still be used; data presented by Reckase (1978) indicate that if a dominant 
first factor exists, items calibrated using a unidimensional model will adequate- 
ly measure that first factor. 

Estimation of the ICC parameters. Tables 1 and 2 show the ICC parameter 
estimates obtained for each of the items in Test 11 and Test 31, respectively. 
Of the items in the conventional test, 17% (5 items) from Test 11 were rejected 
by the parameterization procedure, while 24% (12 items) were rejected for Test 31, 
These losses are comparable to losses observed during other investigations of 
achievement tests using this parameterization procedure; Bejar, Weiss, and 
Kingsbury (1977) lost 22% of their total pool during item parameterization, and 
Brown and Weiss (1977) lost 13% of their total pool. 

For Test 11, values of the a parameter estimates ranged from .63 to 4.69, 
with a mean of 1.48 and a standard deviation of .98. Values of the b parameter 
estimates ranged from -2.35 to 1.32', witha mean of -.98 and a standard devia- 
tion of 1.01. Values cf the c parameter estimates ranged from .00 to .49, with 
a mean of .27 and a standard deviation of v - 1 38 . 

For Test 31, values of estimates of the jt parameter ranged from .63 to 
3.42, with a mean of 1.16 and a standard deviation of .65. Values of the b para- 
meter estimates were from -1.86 to 3.18, with a mean of -.58 and a standard 
deviation of 1.08. The c parameter estimates ranged from .00 to .77, with a 
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Table 1 

ICC Item Parameter Estimates for the Items in Test 1 1 
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27 
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1.32 


.46 


29 


.83 


-1.69 


.09 


30 


1.31 


-.46 
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"Mi ssing values indicate that the item was rejected by the para- 
meter estimation procedure. 

mean of .28 and a standard deviation of .16. For both of these tests the para- 
meter estimates obtained were well within the range established by two earlier 
studies that examined achievement tests using the same item parameterization 
method (Bejar, Weiss, & Kingsbury, 1977; Brown & Weiss, 1977). 

Examination of the item parameter estimates obtained from Test 11 and 
Test 31, using Urry f s guidelines for a good adaptive testing item pool, indicated 
the following: 

1. For both Test 11 and Test 31, 76% of the items had a values exceeding 
.80, while the average value for both tests exceeded 1.00. 

2. The b values wore fairly widely and evenly distributed between -2.0 
and 1.0, but the distribution was rather sparse above 1.0. Consider- 
ing the small numbers of items in the two item pools, the distribution 
of the b values seems appropriate, though the pools might have been 
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Table 2 

ICC Item^Parameter Estimates for the It in Test 3i 



Item Number 

1 

2 
3 
4 
5 
6 
7 
8 
9 
10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 
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25 

26 

27 

28 

29 
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31 

32 

33 
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A2 
A3 
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A5 
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A7 
A8 
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.81 
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1.01 
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1.07 
3.A2 



1.0A 
1.18 
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-1. 39 

- .44 
-1 .43 
-.46 
-.49 
.26 
-1.04 
-.65 
-1.11 
.98 

.09 
-.23 
-1 .64 
-1.56 
-.44 
-1.54 

.40 
1.13 
.45 
-1.74 



-.77 
-.49 
-.97 
-1.83 
-.56 
.80 
.29 
1 .06 



Lower Asympt ote 

.33 

.77 
.37 
.14 
.39 
.38 
.35 

.37 

.36 
.01 

.13 
.23 
.38 
.13 
.16 
.35 
.15 
.19 
.27 

.41 
.39 
.20 
.14 
.11 
.06 

.17 
.45 
.27 



.37 
.39 
.36 
.21 
.38 
.37 
.42 
.33 



Missing values indicate that the item was rejected by the parameter 
estimation procedure. 
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slightly too easy to meet Urry's second guideline. However, Urry's 
guidelines were proposed for ability tests for which it is desired to 
measure precisely across a wide range of ability, whereas the data of 
this study were from a mastery achievement test for which it was de- 
sired to classify students on either side of a mastery level. Thus, 
the distribution of b values would not be expected to conform with 
Urry ' s second recommendation. 
3. Fifty-six percent of the items in Test 11 and 47% of the items in 
Test 31 obtained c estimates below .30. The average c estimate for 
each test was less than .30. 

Thus, in light of Urry's guidelines and the earlier studies, examination of the 
item parameters obtained indicated that the parameter estimates obtained from 
Test 11 and Test 31 rere similar to thv,3e obtained for items which had previous- 
ly been used to improve achievement measurement; consequently, the items were 
appropriate for investigating the AMT strategy. 

Conversion of the Mastery Level to the IC r - Metric 

The ICC item parameter estimates for each test were used in Equation 1 to 
obtain the TCC for each test. Figure 4 shows the resulting TCC for Test 11 
(Figure 4a), using item parameters for the 25 items that survived the cali- 
bration procedure, and for Test 31 (Figure 4b), based on the 38 items for which 
parameter estimates were available on that test. Conversion of the proportion 
correct mastery levels (P*.7, .8, and .9) to the achievement metric (6) are 
also shown. 

Test 11 had a slightly steeper TCC than did Test 31, reflecting the higher 
average discrimination of its items. The lower average b level of the Test 11 
items (i.e., easier items) is reflected in the fact that the TCC for Test 11 is 
shifted to the left along the achievement level, or 8, axis in comparison to 
Test 31. The relatively equal average a parameters for the two tests are re- 
flected in the values of the TCC at 6=-4.0, 

For Test 11 the P=>7 mastery level was converted to 8=-. 90 on the achieve- 
ment metric, the P=.8 mastery level was converted to 0»-.23, and the P*.9 mas- 
tery level was converted to 6= . 75 . For Test 31 the P*.7 mastery level was con- 
verted to 6=-.48; the P*.8 level, to 6=.12; and the P».9 level, to 6=.91 on the 
achievement metric. It can be seen that for both tests the conversion was non- 
linear, reflecting the gain in potential discriminability resulting from con- 
sideration of the unique operating characteristics of each item. 

Test Length 

Table 3 shows the mean number of items, the average amount of information 
obtained from each item administered, and the number of individuals from vari- 
ous subsamples under the AMT and conventional strategies at each of the three 
different mastery levels. The four subgroups for which these data are presented 
are (1) the total group of trainees, (2) the groups of trainees declared masters 
by the relevant testing procedure, (3) the groups of trainees declared nonmas- i 
ters by the relevant testing procedure, anJ (4) the groups of trainees for which 
the AMT procedure made decisions with full confidence (i.e., trainees for whom 
the mastery level. 8 , fell outside the 95% confidence interval at some point 

during the test and terminated the AMT procedure). Frequency distributions of 
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Figure 4 

Test Characteristic Curves for Test 11 and Test 31 , with Conversion 
of Three Mastery Levels (P«.7, .8, and. 9) from the Proportion- 
Correct Metric to the Achievement Metric 
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mimbers of iteme administered for each of these subgroups are in Appendix Table 
B-2 for Test 11 and Appendix Table B-3 for Test 31. 



Table 3 

Sample Size (/V) , Mean Test Length (L) , and Mean Information Per Item (I) 

for AMT and Conventional (Conv) Test for Tests 11 and 31 at 
Thr e e Mastery Levels for Total Group and Three Subgroups 
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Total group. For the total group of trainees responding to Test 11, the 
AMT procedure reduced the average number of items administered (L) substan- 
tially at every mastery level. The minimum reduction in number of items admin- 
istered that was noted was for the P-.8 mastery level, where test length for 
the conventional test was 25 items, compared to a mean test length for the AMT 
procedure of L=17.4 items; this reduction of 7.6 items represents a minimum 
test length reduction of 30.4% of the conventional test length. The maximum 
test length reduction was 48.8% of the conventional test (12.2 items) when a 
mastery level of P-,7 was used._ For the same group of trainees, a gain in the 
average amount of information (J) obtained from each item administered was 
noted for the AMT procedure at the P*.7 and P-.9 mastery levels. The gains in 
information per item administered were .03 information units (IU),. or a 10% 
increase at the P*.7 mastery level, and .05 IU , or a 17% increase, at the 
P-.9 mastery level. 

For the total group of trainees responding to Test 31, the same t^o trends 
were noted. Fii-t, tent length was reduced with the use of the AMT procedure 
at each mastery level. The minimum reduction of test length was noted with the 

') • 
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use of the /"=.8 mastery level, for which the conventional test length of 38 items 
was reduced to a mean AMT length of £-23.4 items-a reduction of 38.4% in mean 
test length. The greatest reduction in test length was noted for the .9 mastery 
level at which the mean AMT length was 14.7-a reduction in test length of 
ox. J^ , 

The second trend was that the AMT procedure provided more inform, ion with 
each item administered than the conventional test for all mastery levels. The 
smallest increase in information was .01 IU per item (a 5% increase), for the 
;"• m ?" ery level - The largest gain in the mean information per item was .08 
IU (a 36% increase), for the P-.7 mastery level. For mastery levels P=.8 and 
P=.9, the percent reduction in test length under AMT was greater for Test 31 
than that noted for Test 11. The increase in information per item noted for 
AMT vas greater for Test 31 than for Test 11 at all three mastery levels. 

Appendix Tables B-2 and B-3 show that test lengths for the AMT procedure 
for different trainees were quite variable. For most of the trainees, either 
a very long test (as long as the conventional test) „as needed, or a very short 
test (8 items or less) was sufficient. This U-shaped distribution of test 
lengths was obtained for both Test 11 and Test 31 across all mastery levels. 

Mastery groups. When only those trainees were considered who were judged 
to be masters for Test 11 at one of the mastery levels by the AMT or the con- 
ventional testing procedure, test length reduction was again noted for the AMT 
procedure at all three mastery levels. For mastery levels P=.7 and P=.8, adap- 
tive tests for those in the mastery group were approximately the same mean 
length as those for the total group; but for mastery level P=.9 adaptive tests 
for the mastery group were much longer (20.8 versus 13.1 items on the average). 
In comparison with the conventional test, for the AMT procedure i n the mastery 
group alone the minimum test length reduction was 4.2 items, or 16.8% of the 
conventional test length of 25 items, at the P=.9 mastery level; and the maxi- 
mum test length reduction was 12.7 iterator 50.8% of the conventional test 
length, at the P=.7 mastery level. 

The AMT procedure and the conventional testing procedure provided almost 
identical mean amounts of information (J) for items administered to the mastery 
groups, even though the AMT procedure administered fawer items at each mastery 
level. However, for these groups interpretation of the differences in mean 
information (I) is obscured by the fact that the two different testing proce- 
dures gave trainees with different achievement levels mastery status. A clear- 
er comparison of information provided by the t^o testing procedures is shown 
below. 

For the groups of trainees labeled as masters for Test 31, test-length re- 
duction was observed with the use of AMT for only two of the three mastery 
levels examined. At the P-.7 mastery level, mean test length was reduced by 
18.6 items, or a reduction of 48.9% of the conventional test length, by use of 
AMT. For the P=.8 mastery level the mean test length was reduced by 10.3 items, 
or a reduction of 27.1% of the conventional test length. For the P=.9 mastery 
level the AMT procedure never reached a decision of mastery in less than 38 
items, the length of the conventional test. 

For Test 31, the AMT procedure resulted in higher mean information per 
item than tht conventional test for the ^=.7 mastery level (a difference of 
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.08 IU per item, or a 44% increase over the conventional test) and the f-.8 mas- 
tery level (.05 IU per item higher, a 1\7 increase). At the P~.9 mastery level 
the conventional test and the adaptive test administered items with equal aver- 
age information. 

As the mastery level became higher, for both Test 11 and Test 31 there was 
a trend for greater numbers of items to be administered before a decision of 
mastery could be made. This resulted from the fact that the higher mastery 
levels fell above the steepest portion of the TCCs, as is shown in Figure 4. 
This would imply that the entire conventional test would have more difficulty 
discriminating among trainees at these mastery levels; consequently, the AMT 
procedure would have to use more of the items from the conventional test in or- 
der to determine whether a trainee was above or below the higher mastery levels. 
This trend may be clearly seen in Appendix Tables B-2 and B-3. For each test, 
trainees were placed in the mastery group for mastery level f=.7 with a wide 
range of test lengths. As the mastery level was raised, trainees were more 
likely to be declared masters only after a larger number of items were admin- 
istered, until for Test 31 at the I-.9 mastery level, all those who were de- 
clared masters took all of the items in the item pool before the mastery decision 
was made . 

.'. wict'SF* iroec* For the trainees who were declared nonmasters for Test 11 

- ,„ bd. 

using either the adaptive or conventional testing procedures, reductions in 
test length were observed at every mastery level with the AMT procedure. The 
smallest reduction in test length, 7.6 items, was observed for the :-.8 mastery 
level and accounted for 30.4% of the conventional test length. The largest re- 
duction is test length was 13.5 items at the /-.9 mastery level, or 54% of the 
conventional test length. At each mastery level for Test 11, more mean infor- 
mation was obtained from each item administered to the nonmasters by the AMT 
procedure than by the conventional procedure. The smallest increase in infor- 
mation per item was .02 IU (a 6.7% increase), for the . r =.8 mastery level. The 
largest increase in mean information was .13 TU (a 33.3% increase) per item, for 
the /=.7 mastery level. 

For the trainees declared nonmasters for Test 31, reductions in mean test 
length were again noted with the AMT procedure at each mastery level. The min- 
imum mean decrease in test length was 11.6 items, or 30.5% of the conventional 
test length of 38 items, at the F~.l mastery level. The maximum reduction in 
average test length was 27.1 items, or 71.3% of the conventional test length, 
at the *-.9 mastery level. As the criterion ?evel increased, the number of 
items needed by the AMT procedure to make the nonmasterv decision steadily de- 
creased . 

For the nonmasterv groups administer* ' Test 31, the mean information per 
item was higher at each mastery level for the AMT procedure than for the con- 
ventional testing procedure. The minimum increase in information was .06 IU 
(a 23% increase) per * rem administer ed, for the "~.8 mastery level; and the maxi- 
mum increase observed was .8 IU per iten (a 28.6?' increase), for the * = .7 mas- 
terv level. 

Across both Tests 11 and 31, there was a tendency for the adaptive test 
to administer fewer items before making a decision of nonmasterv as the mastery 
level increased. The sole exception to this trend was observed for Test 11 at 
the ~=.8 masterv level, which showed a slight increase in the numbei of items 
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adrainistered when compared with the . r = .7 mastery level for that test. For both 
tests a higher mean ii. format ion was obtained for each item administered by the 
AMT procedure at each mastery level. No consistent trend was noted in the dif- 
ferences in average information per item across mastery levels for the two tests. 

* l ^jh-^onfi len> ^ jrou£s . The high confidence groups included only those 
trainees for whom the AMT procedure terminated with full confidence, i.e., 
trainees for whom the Bayesian confidence interval failed to include the mas- 
tery le\»l at some test length at or before the exha> n of the items from 
the conventional test item pool. For Test 11 the AMT procedure terminated with 
high confidence for a minimi, cf 50% of the group of crainees, at the P=.8 mas- 
tery level. The largest high-confidence group was 77% (.7=154) of the total 
group of trainees, at the F=.7 mastery level. 

Test length was reduced considerably by the AMT procedure at all criterion 
levels for the high-confidence groups. The minimum reduction in mean test length 
was observed for the f'=.8 mastery level and was 15.1 items, or 60.4% of the con- 
ventional test length. The largest mean reduction in test length observed was 
18 items, or 72% of the conventional test length, the F=.9 mastery level. 
Modal test length for the high-confidence groups for Test 11 at all mastery levels 
was 3 items (see Appendix Table B-2), or only 12% of the length of the conven- 
tional test (an 88% reduction). The AMT procedure produced greater mean infor- 
mation per item at each mastery level. The smallest observed increase was .06 
IU (a 20% increase) per item administered, for the J*. 7 mastery level. The 
Mrgest mean increase was .15 IU per item (a 37.5% increase), at the P=.8 level. 
For Test 31 the minimum number of trainees in the high-confidence group was 117, 
or 58% of the total group, at the P=.8 mastery level. The largest high-confi- 
dence group was 151, or 76% of the total trainee group, for the mastery level 
' = .9. 



Test length for the AMT procedure was much shorter than the conventional 
test at each criterion level. The smallest ^eduction i mean test length was 
24.9 items, or 65.5% of the conventional test lei 6 th, : r the ^.8 mastery lev- 
el. The largest average reduction in test length was 30.8 items, or 81.1% of 
the total convention./ test length, for the mastery level. Similar to 

Test 11, modal test lengths for Test 31 uere quite short: 4 items for the ?=.7 
mastery level, 5 items for the ?=.8 mastery level, and 3 items (for 57% of the 
high-conf idence group) at the ?-.9 mastery level. 

The AMT procedure produced higher mean information per item than the con- 
ventional testing procedure at all mastery levels. The minimum increase in 
mean information per item was .16 IU (an increase of 66.7% over the mean infor- 
mation provided by the conventional test), for the r ^.9 mastery level. The 
maximum mean information increase that was observed vas .23 IU per item (a 112% 
increase), for the f=.7 mastery level. 

For both Test 11 and lest 31 the AMT procedure made confident decisions for 
between 50% and 77% of the total group at each mastery level. For the trainees 
in the high-confidence groups, the average adaptive test length ranged fr^m 19% 
to 39% of the original conventional ^est length, while modal test lengths were 
only 8% to 6% of the conventional test length (i.e., over 90% reduction). Also, 
th2 adaptive testing procedure resulted in 20% to 119.5% increase in the mean 
amount of informati n obt * ted per i" m over the conventional test. The in- 
crease in mean information per item ..as greater for Test 31 than for Test 11 
at all criterion levels. 
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Table 4 shows the Peaison product-moment (phi) correlations between the 
decisions made by the AMT and conventional testing procedures across all three 
criterion levels for Test 11 and Test 31. The lowest correlation observed .was 
.67, for Test 11 at the /-.9 mastery level. The highest correlation was .97, 
for Test 31 at the f-.8 mastery level. The correlations between mastery de- 
cisions for Test 31 were higher than for Test 11 at all mastery levels. In ad- 
dition, the average decision variance in common between the two testing proce- 
dures was 79% of the total decision variance. 

Table 4 

Phi Correlacions Between Mastery 
Decisions Made by AMT and 
Conventional Testing Procedures for 

Test 11 ar J Test 31, at Three 
Mastery Levels 







Mastery Level 




Test 


P=. 7 


P=.8 


P=.9 


Test 11 


.91 


.88 


.67 


Test 31 


.93 


.97 


.94 



To examine more completely the correspondence in decisions made by the AMT 
and conventional procedures, Table 5 shows joint frequency distributions of de- 
cisions for the two testing procedures at each of the three mastery levels for^ 
Test 11 and Test 31. The lowest level of agreement between the AMT and conven- 
tional testing 'procedures was noted for Test 11 at the P=.9 mastery level, where 
the two testing procedures agreed for 178, or 89.4% of the 199 trainees tested. 
The highest level of agreement was 98.5%, for Test 31 at the 7^,8 and P=.9 mas- 
tery levels. Across both tests and all criterion levels, the two procedures 
agreed for ^5.9% of the trainees tested. For the longer test (Test 31) the two 
procedures agreed for 97.9% of the trainees, and for the shorter test (Test 11) 
the two procedures agreed for 94.0% of the trainees. 

Table 5 

Joint Distributions of Mastery Decisions Made by AMT and 

Conventional Tests 11 and 31 at Three Mastery Levels 

Mastery Level . Test 1 1 Test 31 

and AM T Decision' M astery Nonmasterv Mastery Nonmaste ry 



AMT 


Mastery 


171 


3 


126 


6 


AMT 


Nonmasterv 


1 


' 24 


1 


67 


*.8 












AMT 


Mastery 


12b 


1 


72 


2 


AMT 
-.9 


Nonmasterv 


10 




1 


125 


AMT 


Mastery . 


28 


n 


26 


2 


AMT 


Nonmasterv 


IS 


LSO 


1 


171 
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Figures 5 and 6 show the information obtained by Conventional Tests 11 and 
31, respectively, and adaptive test i** procedures as a function of estimated 
achievement level (0). (Points plotted in these figures are based on mean infor- 
mation obtained from trainees within a plus or minus .1 range around a given ^; 
numerical values of information are shown in Appendix Table B-4.) Figures 5 and 
6 each show three adaptive testing information curves — one for each mastery 
level examined — and one conventional test curve. 

Figure 5 shows that Test 11 was poorly designed to make mastery decisions 
at middle-range mastery levels (0 between -.5 and + .5, or proportion correct of 
about ;-.75 to J-.85), since the test's information was predominantly concen- 
trated at low achievement levels (0<-l . 0), with an information spike caused by 
a single highly discriminating item (Item 28; see Table 1) at about 1.0 on the 
achievement continuum. Information functions for the AMT strategy at each of 
the three mastery levels closely approximated the conventional information func- 
tion in the region near each respective mastery level (G-. 8, -.2, -9). T n ad- 
dition, as achievement level moved away from the mastery levels, the AMT infor- 
mation functions fell below the information function for the conventional test, 
particularly at the lower achievement levels. Further, as the difference be- 
tween the achievement level and the mastery level increased, the difference in 
amounts of information used by the AMT procedure and the conventional procedure 
tended to become larger. However, for the f=.8 mastery level an upturn in the 
information function occurred below the -1.3 achievement level, and th<* differ- 
ence in information between the conventional and adaptive procedure decreased 
slightly. The same type of upturn was noted for the P=.9 mastery level, for 9 
levels belov -1.1. 

Figure 6 shows that for Test 31 the conventional test information function 
was monotone decreasing within the observed range of tra inees 1 achievement levels. 
This implies that Test 31 provided its most precise measurement at low achieve- 
ment levels and that differences between the two testing procedures should be 
most noticeable at low achievement levels. The AMT information functions for 
Test 31 in Figure 6 reinforce the trends noted in Test 11 for eacn of the mas- 
tery levels. That is, 

1. The AMT - info rmat ion functions each closely approximated the convent ion- 

test information function in the region of the achievement continuum 
near the appropriate mastery level. 

2. For achievement levels beyond the rep -i near the mastery level, the 
AMT information function was lower t\ the conventional test infor- 
mation function. 

3. The difference in informatior between the AMT and conventional testing 
procedures was greater for ac lievement levels further from the speci- 
fied mastery level, up to a point. 

4. At the lower end of the achievement continuum (G^-,5), an increase in 
the amount of information provided by the AMT procedure was noted for 
each of the mastery levels examined. The point on the 9 continuum at 
which the upturn was noted was lower for each successively lower cri- 
terion level. 

For Test 31 one additional result was noted that did not appear in the Test 11 
AMT data: For both the ^=.8 and >.9 criterion levels, a final downturn in the 
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Figure 5 

Mean Obtained Information as a Function of Estimated Achievement 
Level for AMT and Conventional Test 11 at Three Masterv Levels 
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Figure 6 

Mean Obtained Information as a Function of Estimated Achievement 
^evel for AMT and Conventional Test 31 at Three Mastery Levels 
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information functions for the AMT procedure was observed at the lowest obtained 
6 levels. This implies that the observed upturns in information may have been 
one side of an information sp'ke, possibly caused by the minimum limit of three 
items placed on the AMT procedure. 

Pitt* •utifiion ik : **> >? aliens 



The unidimensional three-parameter logistic ICC model was fit to two con- 
ventional tests that were previously used to make mastery decisions in a mili- 
tary training course. Data originally gathered during the training course 
were used to evaluate, in real-data simulation, the efficiency of the proposed 
adaptive mastery testing (AMT) procedure in terms of the number of items admin- 
istered, the information obtained, and the degree of agreement between the AMT 
and conventional testing procedures. The AMT procedure was simulated assuming 
three different mastery levels, stated in terms of the achievement metric, 
through the use of the test characteristic curves (TCCs) for the two conven- 
tional tests. The results of these simulations indicated that the proposed 
AMT procedure reduced the number of items administered during the average test, 
while at the same time making decisions which were very much the same as those 
made by the conventional testing procedure. 

The AMT procedure reduced the average test length for the entire group of 
trainees by 30% to 61% of the conventional test length. The reductions in test 
length observed varied across different mastery levels for both of the conven- 
tional tests. When specific subgroups of the samples were considered, mean 
test length reductions of up to 81% of the items in the conventional test were 
again observed in almost every subgroup examined at each mastery level and for 
both tests. The only subgroup for which no test length reduction was observed 
for the AMT strategy was the group passing Test 31 at the highest criterion lev- 
el (P=.90 correct). For the groups of trainees for which the AMT procedure was 
able to make high-confidence decisions, AMT mean test lengths were 60% to 81% 
shorter than the conventional tests across all mastery levels examined. Fur- 
ther, high-confidence decisions were made for 50% to 77% of the trainees at 
each mastery level. 

At each mastery level for each test, agreement was high between the deci- 
sions made by the adaptive and conventional testing procedures. The two pro- 
cedures made the same decision for approximately 96% of the cases across all 
circumstances. Using the larger item pool (Test 31), the two procedures agreed 
for about 98% of the cases. The lowest agrsement level observed was approxi- 
mately 39%. 

At each mastery level examined, the information func-tions jbserved for the 
adaptive tests closely approximated the information functions obtained for the 
relevent conventional test at achievement levels close to the mastery level, and 
fell below the conventional test information functions for more extreme achieve- 
ment levels. For the achievement levels very different from the mastery level, 
the difference between the information functions for the two testing procedures 
reached a maximum; and at the most extreme achievement levels the difference in 
information decreased slightly. 

Thus, the AMT procedure was shown to make mastery decisions very similar 
to those made by the conventional testing procedure, while administering fewer 
items, by using the information in the item pool that was available to make 
high-confidence decisions. 

't 
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The test-length reduction observed using the AMT procedure may be attrib- 
uted to two characteristics of Mie procedure. First, the AMT strategy adminis- 
tered to a trainee only those iteirs which provided the most precise measurement 
at the trainee's current level of kK Second, the AMT procedure terminated the 
test as soon as enough information was available to make a decision at a pre- 
determined level of confidence concerning the trainee's mastery level. The 
terminate n rule allowed the test to terminate prior to the exhaustion of the 
item pool, if enough information was available in the items, and the item ad- 
ministration procedure presented the most informative items early in the test- 
ing session. 

Each of these characteristics of the AMT procedure can be more clearly seen 
by examination of the Bayesian point estimates and the associated confidence in- 
tervals obtained from a trainee's responses after each item administered by the 
AMT and conventional testing procedures. One such record is shown in Figure 7 
for a trainee responding to Test 11. The 0 estimates plotted in- Figure 7 in- 
S elude 95% Bayesian confidence intervals for the 0 estimate after the first item 

and after every third item administered thereafter for both AMT and convention- 
al procedures (even though the confidence interva-1 was not used for making the 
mastery decision with the conventional procedure). 

Figure 7 

Achievement Level Estimates for Trainee 14 after Each Item Administered by AMT 
and Conventional Testing Procedures for Test 11, with 95% Bayesian Confidence 
intervals Indicated after Every Third Item (P=.7 Mastery Level) 
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tt may be seen from Figure 7 that both testing procedures made a nonmastery 
decision for the trainee (i.e., determined that the trainee's true achievement 
level fell below the specified masterv level), even though both procedures 
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estlmated the trainee's achievement level as being above the mastery level for 
the first few items. The conventional test 0 estimates were above the mastery 
level for the first 7 items; the adaptive test 9 estimates dropped below the 
mastery level after only 2 items. The AMT procedure made the mastery decision 
after administering 9 items, compared with the conventional test length of 25 
items. At each test length greater than a single item, the Bayesian confidence 
interval around the conventional test 6 estimate was larger than the confidence 
interval around the AMT 9 estimate. This indicates the greater measurement pre- 
cision available to the AMT procedure due to the adaptive item administration 
procedure . 

Furth. r, it may be noted in Figure 7 that the conventional test strategy 
finally resulted in a Bayesian confidence interval that fell completely be]ow 
the mastery level after 19 it-ms were administered (still over twice t'-e test 
length of the adaptive test); but since the conventional testing procedure does 
not terminate even after this high-confidence level is reached, 6 more items 
were administered before the test ended. This illustrative example showed that 
the AMT procedure was far more economical .han the conventional procedure in 
terms of test length, due to the adaptive item selection procedure and the use 
of the Bayesian confidence interval as a termination mechanism. 

Additional A dvaritai> -s o<r the AMT Stratum 

*■ M _ tt.M.. 

The ICC-based adaptive mastery testing strategy described in this report 
has several other advantages over conventional testing procedures jsed to make 
mastery decisions. As has been demonstrated with these data, use of the ICC 
metric and related achievement estimation procedures can result in mastery de- 
cisions for most trainees (50% to 77%) with known and predetermined levels of 
confidence. Coupled with appropriate design of mastery testing item pools using 
ICC concepts, the percentage of high-confidence decisions could be substantially 
increased until mastery decisions could be made for virtually all students at 
the same high and predetermined level of confidence. Design of such mastery 
testing item pools would include a concentration of highly discriminating items 
around the mastery level, plus sufficient numbers of highly discriminating items 
elsewnere along the achievement continuum to permit high-confidence decisions 
to be •nads for all students. Actual numbers of items required at various dis- 
crimination levels could be estimated using Owen's Bayesian scoring procedure 
and information on the difficulties and discriminations of items to estimate in 
advance the values of the Bayesian posterior variance (which is used to construct 
the Bayesian confidence intervals used in the AMT procedure) at the expected 
levels of 0. 
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If the mastery testing item pool is not designed in advance to permit high- 
confidence decisions for each student, the AMT procedure still permits the test- 
er tc determine the confidence level of each mastery decision made, even if it 
is not a high-confidence decision. This can be determined by locating the dis- 
tance of the mastery level, 0^, from the student's estimated achievement level, 
v. This distance can then be treated as a standardized deviation from the mean 
of a normal distribution, with a variance equal to the estimated posterior var- 
iance; and .50 plus the area of the portion of the normal distribution included 
in that deviation will then give the confidence level for a given mastery deci- 
sion for that student. In this way, a confidence level for the mastery decision 
can be attached to each such decision. As a result, instructional decisions 
based on lower confidence level mastery decisions can be made more tentatively. 
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A further advantage of the ICC-based AMT strategy is that it can be exten- 
ded to the multiple-content area mastery testing problem with further savings in 
test administration time. In many training environments, it is desirable to 
measure mastery on a number of learning objectives at the same point in time. 
Using conventional testing procedures to measure mascery on 6 objectives, for 
example, the student would have to take 6 different tests with a fixed number 
of items, for a potential total of over 100 items. However, since the AMT strat- 
egy utilizes the same item selection and scoring procedures that Brown and Weiss 
(1977) used in their intercontent branching adaptive testing strategy, the AMT 
strategy can operate in the same fashion; all that differs is the intrasubtest 
termination rule. Thus, in the multicontent branching AMT strategy, the achieve- 
ment level estimates used to make the mastery decisions in each of a number of 
content-based mastery tests would be used to serve as entry points for beginning 
testing (using appropriate multiple regression equations) in subsequent mastery 
tests in the battery. If there is any correlation between mastery decisions 
made on the separate subtests, the use of an intercontent branching AMT should 
result in substantial additional savings in testing time over that obtained by 
use of the AMT strategy in each subtest separately. 

The AMT procedure described above, or an improved version, should thus be 
extremely useful in a training sequence in which many subject areas are taught 
and tested within a short time, thus putting a premium on testing time. A sel> 
paced instructional setting in which a student is given more than one attempt 
to demonstrate mastery of a content area with a single test may also benefit 
from an AMT procedure that would allow students to take different itetiis on each 
attempt, thus avoiding the problem of students merely "learning! 1 the test, with- 
out learning the subject matter. 

The AMT procedure should be tested in an actual classroom situation. Fur- 
ther research should also be conducted to determine whether conventional mastery 
testing or the AMT procedure result in mastery decisions which more accurately 
predict external performance criteria. 
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Aprendix A 



Ill^tration *!TFr fyc?* I <r^ ?'*.e?sini Items ^or AMT 

. *~ *_ ^ d . 

The essential characteristics of the adaptive testing strategy employed in 
this study have been described in previous sections. However, to understand the 
method more completely, it is helpful to see the results of its application with 
an actual testee. 

Figure A-l shows estimated item information curves for six items from Test 1. 
(There would probably be many items in the test, but only six were chosen to sim- 
plify the illustration.) The height of the information curve at a given achieve- 
ment level (6) indicates the amount of information provided by the item. Most 
of the items are fairly "peaked"; that is, thev provide information over a rel- 
atively narrow range of the achievement continuum. While the information curves 
overlap to some degree, different items provide different amounts of information 
at a given point on the achievement continuum. The guiding principle for the 
adaptive procedure is to administer the item which provides the most information 
at the current achievement estimate (8). 

Figure A-l 

Estimated Item Information Curves for Six Items from Test 1 



2.0 fc 




e 

Achievement Leve 1 



For a testee beginning Test 1, the initial achievement estimate was G=0; 
this is shown by the vertical dashed line in Figure A-l. Of the six items in 
the example, only three items had essentially nonzero information values at 8=0; 
these values, shown by the horizontal dotted lines in Figure A-l, were .95 for 
Item 5, .60 for Item 15, and .10 for Iter 12. Applying the rule that the item 
selected is the one which provides the most information at the current p » 
Item 5 would be selected for administration. 
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Figure A-2 shows the revised value of 0-.46 derived from the Bayesian 
scoring routine, assuming that a correct answer was given to Item 5, The con- 
fidence interval surrounding this 9 is assumed to contain the mastery level, 
so testing would continue. The information curve for Item 5, which was al- 
ready administered, is not shown in Figure A-2. At the new value of 9, only 
Items 15 and 12 provide significant values of information. Since Item 15 has 
an information value of .60 and Item 12 has a value of .20, Item 15 would be 
selected as the second item to be administered to this testee. 



Figure A-2 

Estimated Item Information Curves for Five Items from Test 1 
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Assuming that the testee had correct] y answered Item 15, the value of 9 
would increase to .92. The confidence interval around this new § still contains 
the arbitrary mastery level, so testing would continue. At 9=.92, only Item 12 
would provide significant amounts of information, and it would be administered 
next. Thus, at each step during the testing procedure, the item which provides 
the most information concerning the testee's current level of G is administered. 
In a larger item pool, testing would continue in this fashion until it was pos- 
sible to make a mastery decision with a prespecified level of confidence, at 
which point the test wou" d terminate. 
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Appendix B * 
Supplementary Tables 



Table B-l 

Eigenvalues of the First 10 Common Factors Extracted 
from Item Intercorrelations for Test 11 and Test 31, 
and for Parallel Random-Data Factors 



Test 11 Test 31 





Real 


Random 


Real 


Random 


Factor 


Data 


Data 


Data 


Data 


1 


6.14 


1 


.75 


10.23 


2.04 


2 


1.85 


1 


.61 


3.08 


1.90 


3 


1.60 


1 


.52 


2.10 


1.84 


4 


1.41 


1 


51 


2.06 


1.82 


5 


1.38 


1 


50 


1.82 


1.76 


6 


1.30 


1 


.39 


1.68 


1.72 


7 


1.24 


1 


33 


1.58 


1.62 


8 


1.16 


1 


28 


1.49 


1.58 


9 


1.15 


1 


.25 


1.38 


1.56 


10 


.97 


1 


.20 


1.31 


1.48 



9 



ERIC 



Table 2 

Frequency Distributions of Number of Items Administered by 

AMT Procedure from Test 11 by Mastery Subgroup for 
Each Mastery Level (P=. 7 , .8, and .9) 



Group 



Number of High 

Items Total Mastery Nonmastery Confidence 

Administered P°. 7 P=.8 P=.9 P=,7 ^=,8 P=,9 P=.7 P=.8 P=.9 P=. 7 P=.8 P=.9 



3 


43 


54 


45 


39 


39 




4 


15 


45 


43 


54 


45 


4 


1 


1 


24 








1 


1 


24 


1 


1 


24 


5 


36 


1 


10 


36 








1 


1*0 


36 


1 


10 


6 


1 




3 








1 




3 


1 




3 


7 


10 


3 


17 


10 




8 




3 


9 


10 


3 


17 


8 


13 


2 


2 


13 


1 






1 


2 


13 


2 


2 


9 


3 




2 








3 




2 


3 




2 


10 






1 












1 






1 


11 


7 




6 


7 










6 


7 




6 


12 


1 




3 








1 




3 


1 




3 


13 


3 


2 




3 








2 




3 


2 




14 


1 


2 


1 


1 










1 


1 


2 


1 


15 




3 


2 




2 






1 


2 




3 


2 


16 


1 


1 


1 


1 








1 


1 


1 


1 


1 


17 


1 


7 


4 




6 




1 


1 


4 


1 


7 


4 


18 


7 




2 


7 










2 


7 




2 


19 


4 


6 


1 


1 






3 


6 


1 


4 


6 


1 


20 


4 






4 












4 






21 




1 


3 










1 


3 




1 


3 


22 




2 


2 




2 








2 




2 


2 


23 


1 




3 


1 










3 


1 




3 


24 


7 


9 




7 


8 






1 




7 


9 




25 


55 


J05 


67 


44 


68 


26 


11 


37 


41 


10 


6 
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Table B-3 

Frequency District ions of Number of Items Administered by AMT Procedure 
From Test 31 by Mastery Subgroup for Each Mastery Level (P=.7, .8, and .9) 

Group _ 

Number of High 

1 1 ems Total Mastery Nonmastery _ Confidence 

A dministered P-.7 p~.8 ^ .9 P*,7 P=,8 P*. 9 7 P*.8 P=.9 P*. 7 P=. 8 P= . 9 



J 




7 


86 






7 


7 
/ 


RA 


7 
/ 


7 


o u 


4 


27 


1 


11 
X X 


27 






1 

X 


1 1 

X X 


27 


1 

X 


l l 


5 


4 • 


15 


7 






4 


1 s 

X J 


7 


A 


1 S 

X J 


7 

f 


6 


8 ' 


6 


A 


7 




1 


A 

u 


A 

u 


ft 

o 


A 


A 

D 


7 


10 


10 


4 


1 0 


7 






A 


1 0 

X V 


1 0 

X \J 


A 


8 


» 6 


4 


2 


5 




1 


A 


? 


A 


A 


2 


9 


6 


3 


1 


5 




1 


3 


1 

L 


A 
u 


3 


l 

X 


10 


7 


- 6 


7 


5 


4 


2 


2 


7 


7 




7 


11 


1 


10 


4 




3 


1 


7 


4 


] 


10 


4 


12 


1 


4 


3 


1 


1 




j 


3 


1 

X 


A 


j 


13 * 


2 


g 


2 


2 






8 


2 


2 


ft 

O 


2 


14 


5 


1 


2 


4 




1 


1 


2 


5 


1 


2 


15 


3 


2 




3 






2 




3 


2 




16 


5 


2 


2 


5 






2 


2 


5 


2 


2 


17 


2 fc 


6 




1 


5 


1 


1 




2 






18 


3 


6 




1 


5 


2 


1 




3 






19 


4 


4 


1 


1 


3 


3 


1 


1 


4 


4 


1 


20 


2 


1 




1 




1 


1 




2 


1 




21 


3 


4 




2 




1 


4 




3 


4 




?2 






I 










2 






2, 


23 


5 


2 




4 




1 


2 




5 


2 




24 


1 


5 


1 


1 


2 




3 


1 


1 


5 ' 


1 


25 


1 


1 




1 






1 




1 


1 




26 
























27 


1 


1 




1 


1 








1 


1 




28 




1 


1 




1 






1 




1 


1 


29 


3 


1 






1 


3 






3 


1 




30 


1 




2 






1 




2 


1 




2 


31 


2 


1 




1 




1 


1 




2 


1 




32 




1 










1 






1 




33 




1 


1 




1 






1 




1 


1 


34 






2 










2 






2 


35 


i 






1 










1 






36 


1 


2 


2 






1 


2 


2 


1 


2 


2 


37 




1 


2 








1 


2 




1 


2 


38 


78 


83 


49 


43 


40 28 


35 


43 


21 
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Table B-4 

Mean Information (I) Obtained by AMI and Conventional Testing Procedures for Tests 11 and 31 
At Three Mastery Levels (P-.7, % 8, and .9) for Trainees with Various Achievement Level 

Estimates (9), and Numberof Trainees (N) at Edch Achievement Level 



Test 11 



AMT 



Range Conventional (P«. 7) 



(F-.8) 



Lo 




Hi 


I 


/V 


-2.000 




.800 




■ i J 


1 


-1.799 


-1 


.600 


1 1 
1 1 


ftft 
■ oo 


5 


-1.599 


-1 


.400 


1 1 

X. X 


59 


3 


-1.399 


-1 


.200 


1 0 

X w 


Sft 

• -JO 


3 


-1.199 


-1 


.000 


Q 

o 


87 


10 


-.999 




.800 


7 


A 7 


4 


-.799 




. 600 


6 


.61 


X L. 


-.599 




.400 


5 


.41 


10 


-.399 




.200 


4 


.65 


14 


-.199 




.000 


4 


.04 


21 


.001 




.200 


3 


.73 


15 


.201 




.400 


3 


.72 


19 


.401 




.600 


4 


.91 


19 


.601 




.800 


8 


.86 


22 


.801 


1 


.000 


17 


.CI 


16 


1.001 


1 


.200 


16 


.45 


9 


1.201 


1 


.400 


6 


.35 


3 


1.401 


1 


.600 


3 


70 


3 


1.601 


1 


.800 


1 


63 


5 


1.801 


2 


000 









I 



7.42 
10.27 
10.46 
8.93 
7.68 
6.63 
5.51 
4.73 
3.79 
2.69 
2.21 
4.69 
8.80 
14.40 



1.29 39 



I 



2. 

7 

4 

6 

6 
10 

9 

9 
16 
24 
47 

4 
11 

1 



6 


.20 


7 


4 


.47 


4 


.46 


12 


3 


.19 


6 


.56 


4 


1 


.60 


7 


.17 


11 






6 


.69 


10 




.34 


5 


.56 


10 


2 


.53 


4 


.68 


18 


3 


05 


4 


03 


18 


3 


68 


3 


vt- 


"20 


3 


64 


3 


71 


9 


3 


73 


4 


60 


9 


4. 


56 


8. 


37 


16 


9. 


41 


15. 


81 


• 9 


16. 


09 


13. 


94 


3 


17. 


33 








6. 


10 


1. 


29 


39 


3. 


16 








1. 


41 



47 
25 
20 
10 
15 
13 
12 
9 
9 
7 
2 
3 
8. 



Test 31 



AMT 



(P=.9) C onventional (P=. 7) 



r 



(P°-8) 



23.44 
18.08 
15.53 
12.31 
10.11 
9.30 
8.79 
8.46 
8.06 
7.58 
6.96 
6.48 
6.05 
5.78 
5.24 
4.89 
4.05 
2.94 



N 



2 
1 
10 
9 
18 
21 
15 
'19 
18 
12 
15 
16 
7 
6 
11 
6 
4 
5 



19.58 
10.10 
9.36 
9.19 
10.07 
9.35 
8.78 
8.43 
8.06 
7.57 
5.83 
3.88 
2.60 
2.93 
1.97 



1 

8 
12 
11 

7 
21 
15 
17 
15 

4 
15 
23 
18 

5 
27 



4.82 
7.23 
6.57 
4.71 
5.80 
7.61 
8.40 
8.04 
7.51 
6.95 
6.50 
6.00 
5.61 
4.67 
3.51 
2.20 



4 
10 
8 
22 
28 
12 
17 
17 
12 
12 
14 
6 
9 
13 
8 
7 



(P-.9) 



I 



8.00 

10.36 
2.17 



.24 
.70 
.96 
.10 



6.89 
6.51 
6.00 
5.74 
5.18 
4.86 
3.84 
3.41 
2.13 



N 



4 
25 
24 
58 
22 
11 
4 
8 
6 
6 
10 
6 
3 
1 
5 
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